minor class
- Asia > Middle East > Israel (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- (2 more...)
Leave No Stone Unturned: Mine Extra Knowledge for Imbalanced Facial Expression Recognition
Facial expression data is characterized by a significant imbalance, with most collected data showing happy or neutral expressions and fewer instances of fear or disgust. This imbalance poses challenges to facial expression recognition (FER) models, hindering their ability to fully understand various human emotional states. Existing FER methods typically report overall accuracy on highly imbalanced test sets but exhibit low performance in terms of the mean accuracy across all expression classes. In this paper, our aim is to address the imbalanced FER problem. Existing methods primarily focus on learning knowledge of minor classes solely from minor-class samples.
- Asia > Middle East > Israel (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- (2 more...)
Adaptive Cluster-Based Synthetic Minority Oversampling Technique for Traffic Mode Choice Prediction with Imbalanced Dataset
Urban datasets such as citizen transportation modes often contain disproportionately distributed classes, posing significant challenges to the classification of under-represented samples using data-driven models. In the literature, various resampling methods have been developed to create synthetic data for minority classes (oversampling) or remove samples from majority classes (undersampling) to alleviate class imbalance. However, oversampling approaches tend to overgeneralize minor classes that are closely clustered and neglect sparse regions which may contain crucial information. Conversely, undersampling methods potentially remove useful information on certain subgroups. Hence, a resampling approach that takes the inherent distribution of data into consideration is required to ensure appropriate synthetic data creation. This study proposes an adaptive cluster-based synthetic minority oversampling technique. Density-based spatial clustering is applied on minority classes to identify subgroups based on their input features. The classes in each of these subgroups are then oversampled according to the ratio of data points of their local cluster to the largest majority class. When used in conjunction with machine learning models such as random forest and extreme gradient boosting, this oversampling method results in significantly higher F1 scores for the minority classes compared to other resampling techniques. These improved models provide accurate classification of transportation modes.
- Asia > Middle East > Saudi Arabia > Mecca Province > Thuwal (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Leave No Stone Unturned: Mine Extra Knowledge for Imbalanced Facial Expression Recognition
Facial expression data is characterized by a significant imbalance, with most collected data showing happy or neutral expressions and fewer instances of fear or disgust. This imbalance poses challenges to facial expression recognition (FER) models, hindering their ability to fully understand various human emotional states. Existing FER methods typically report overall accuracy on highly imbalanced test sets but exhibit low performance in terms of the mean accuracy across all expression classes. In this paper, our aim is to address the imbalanced FER problem. Existing methods primarily focus on learning knowledge of minor classes solely from minor-class samples.
SMCL: Saliency Masked Contrastive Learning for Long-tailed Recognition
Park, Sanglee, Hwang, Seung-won, So, Jungmin
Real-world data often follow a long-tailed distribution with a high imbalance in the number of samples between classes. The problem with training from imbalanced data is that some background features, common to all classes, can be unobserved in classes with scarce samples. As a result, this background correlates to biased predictions into ``major" classes. In this paper, we propose saliency masked contrastive learning, a new method that uses saliency masking and contrastive learning to mitigate the problem and improve the generalizability of a model. Our key idea is to mask the important part of an image using saliency detection and use contrastive learning to move the masked image towards minor classes in the feature space, so that background features present in the masked image are no longer correlated with the original class. Experiment results show that our method achieves state-of-the-art level performance on benchmark long-tailed datasets.
- Asia > South Korea > Seoul > Seoul (0.05)
- North America > Canada > Ontario > Toronto (0.04)
Improving SMOTE via Fusing Conditional VAE for Data-adaptive Noise Filtering
Hong, Sungchul, An, Seunghwan, Jeon, Jong-June
Recent advances in a generative neural network model extend the development of data augmentation methods. However, the augmentation methods based on the modern generative models fail to achieve notable performance for class imbalance data compared to the conventional model, the SMOTE. We investigate the problem of the generative model for imbalanced classification and introduce a framework to enhance the SMOTE algorithm using Variational Autoencoders (VAE). Our approach systematically quantifies the density of data points in a low-dimensional latent space using the VAE, simultaneously incorporating information on class labels and classification difficulty. Then, the data points potentially degrading the augmentation are systematically excluded, and the neighboring observations are directly augmented on the data space. Empirical studies on several imbalanced datasets represent that this simple process innovatively improves the conventional SMOTE algorithm over the deep learning models. Consequently, we conclude that the selection of minority data and the interpolation in the data space are beneficial for imbalanced classification problems with a relatively small number of data points.
- Asia > South Korea > Seoul > Seoul (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
Exploring Prompting Methods for Mitigating Class Imbalance through Synthetic Data Generation with Large Language Models
Kim, Jinhee, Kim, Taesung, Choo, Jaegul
Large language models (LLMs) have demonstrated impressive in-context learning capabilities across various domains. Inspired by this, our study explores the effectiveness of LLMs in generating realistic tabular data to mitigate class imbalance. We investigate and identify key prompt design elements such as data format, class presentation, and variable mapping to optimize the generation performance. Our findings indicate that using CSV format, balancing classes, and employing unique variable mapping produces realistic and reliable data, significantly enhancing machine learning performance for minor classes in imbalanced datasets. Additionally, these approaches improve the stability and efficiency of LLM data generation.
- Asia > South Korea > Daejeon > Daejeon (0.04)
- North America > United States > California (0.04)
- North America > Nicaragua (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Online Class Incremental Learning on Stochastic Blurry Task Boundary via Mask and Visual Prompt Tuning
Moon, Jun-Yeong, Park, Keon-Hee, Kim, Jung Uk, Park, Gyeong-Moon
Continual learning aims to learn a model from a continuous stream of data, but it mainly assumes a fixed number of data and tasks with clear task boundaries. However, in real-world scenarios, the number of input data and tasks is constantly changing in a statistical way, not a static way. Although recently introduced incremental learning scenarios having blurry task boundaries somewhat address the above issues, they still do not fully reflect the statistical properties of real-world situations because of the fixed ratio of disjoint and blurry samples. In this paper, we propose a new Stochastic incremental Blurry task boundary scenario, called Si-Blurry, which reflects the stochastic properties of the real-world. We find that there are two major challenges in the Si-Blurry scenario: (1) inter- and intra-task forgettings and (2) class imbalance problem. To alleviate them, we introduce Mask and Visual Prompt tuning (MVP). In MVP, to address the inter- and intra-task forgetting issues, we propose a novel instance-wise logit masking and contrastive visual prompt tuning loss. Both of them help our model discern the classes to be learned in the current batch. It results in consolidating the previous knowledge. In addition, to alleviate the class imbalance problem, we introduce a new gradient similarity-based focal loss and adaptive feature scaling to ease overfitting to the major classes and underfitting to the minor classes. Extensive experiments show that our proposed MVP significantly outperforms the existing state-of-the-art methods in our challenging Si-Blurry scenario.
- Research Report > New Finding (1.00)
- Instructional Material (1.00)
- Education > Educational Setting > Online (1.00)
- Education > Educational Technology > Educational Software > Computer Based Training (0.40)
GraphSHA: Synthesizing Harder Samples for Class-Imbalanced Node Classification
Li, Wen-Zhi, Wang, Chang-Dong, Xiong, Hui, Lai, Jian-Huang
Class imbalance is the phenomenon that some classes have much fewer instances than others, which is ubiquitous in real-world graph-structured scenarios. Recent studies find that off-the-shelf Graph Neural Networks (GNNs) would under-represent minor class samples. We investigate this phenomenon and discover that the subspaces of minor classes being squeezed by those of the major ones in the latent space is the main cause of this failure. We are naturally inspired to enlarge the decision boundaries of minor classes and propose a general framework GraphSHA by Synthesizing HArder minor samples. Furthermore, to avoid the enlarged minor boundary violating the subspaces of neighbor classes, we also propose a module called SemiMixup to transmit enlarged boundary information to the interior of the minor classes while blocking information propagation from minor classes to neighbor classes. Empirically, GraphSHA shows its effectiveness in enlarging the decision boundaries of minor classes, as it outperforms various baseline methods in class-imbalanced node classification with different GNN backbone encoders over seven public benchmark datasets. Code is avilable at https://github.com/wenzhilics/GraphSHA.
- North America > United States > California > Los Angeles County > Long Beach (0.15)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- (18 more...)